92 research outputs found

    A New Estimator of Intrinsic Dimension Based on the Multipoint Morisita Index

    Full text link
    The size of datasets has been increasing rapidly both in terms of number of variables and number of events. As a result, the empty space phenomenon and the curse of dimensionality complicate the extraction of useful information. But, in general, data lie on non-linear manifolds of much lower dimension than that of the spaces in which they are embedded. In many pattern recognition tasks, learning these manifolds is a key issue and it requires the knowledge of their true intrinsic dimension. This paper introduces a new estimator of intrinsic dimension based on the multipoint Morisita index. It is applied to both synthetic and real datasets of varying complexities and comparisons with other existing estimators are carried out. The proposed estimator turns out to be fairly robust to sample size and noise, unaffected by edge effects, able to handle large datasets and computationally efficient

    Multi-scale support vector algorithms for hot spot detection and modelling

    Get PDF
    The algorithmic approach to data modelling has developed rapidly these last years, in particular methods based on data mining and machine learning have been used in a growing number of applications. These methods follow a data-driven methodology, aiming at providing the best possible generalization and predictive abilities instead of concentrating on the properties of the data model. One of the most successful groups of such methods is known as Support Vector algorithms. Following the fruitful developments in applying Support Vector algorithms to spatial data, this paper introduces a new extension of the traditional support vector regression (SVR) algorithm. This extension allows for the simultaneous modelling of environmental data at several spatial scales. The joint influence of environmental processes presenting different patterns at different scales is here learned automatically from data, providing the optimum mixture of short and large-scale models. The method is adaptive to the spatial scale of the data. With this advantage, it can provide efficient means to model local anomalies that may typically arise in situations at an early phase of an environmental emergency. However, the proposed approach still requires some prior knowledge on the possible existence of such short-scale patterns. This is a possible limitation of the method for its implementation in early warning systems. The purpose of this paper is to present the multi-scale SVR model and to illustrate its use with an application to the mapping of Cs137 activity given the measurements taken in the region of Briansk following the Chernobyl acciden

    Long-range fluctuations and multifractality in connectivity density time series of a wind speed monitoring network

    Full text link
    This paper studies the daily connectivity time series of a wind speed-monitoring network using multifractal detrended fluctuation analysis. It investigates the long-range fluctuation and multifractality in the residuals of the connectivity time series. Our findings reveal that the daily connectivity of the correlation-based network is persistent for any correlation threshold. Further, the multifractality degree is higher for larger absolute values of the correlation threshol

    Learning wind fields with multiple kernels

    Get PDF
    This paper presents multiple kernel learning (MKL) regression as an exploratory spatial data analysis and modelling tool. The MKL approach is introduced as an extension of support vector regression, where MKL uses dedicated kernels to divide a given task into sub-problems and to treat them separately in an effective way. It provides better interpretability to non-linear robust kernel regression at the cost of a more complex numerical optimization. In particular, we investigate the use of MKL as a tool that allows us to avoid using ad-hoc topographic indices as covariables in statistical models in complex terrains. Instead, MKL learns these relationships from the data in a non-parametric fashion. A study on data simulated from real terrain features confirms the ability of MKL to enhance the interpretability of data-driven models and to aid feature selection without degrading predictive performances. Here we examine the stability of the MKL algorithm with respect to the number of training data samples and to the presence of noise. The results of a real case study are also presented, where MKL is able to exploit a large set of terrain features computed at multiple spatial scales, when predicting mean wind speed in an Alpine regio

    Community detection analysis in wind speed-monitoring systems using mutual information-based complex network

    Get PDF
    A mutual information-based weighted network representation of a wide wind speed-monitoring system in Switzerland was analyzed in order to detect communities. Two communities have been revealed, corresponding to two clusters of sensors situated, respectively, on the Alps and on the Jura-Plateau that define the two major climatic zones of Switzerland. The silhouette measure is used to evaluate the obtained communities and confirm the membership of each sensor to its cluster

    Advanced Analysis of Temporal Data Using Fisher-Shannon Information: Theoretical Development and Application in Geosciences

    Get PDF
    Complex non-linear time series are ubiquitous in geosciences. Quantifying complexity and non-stationarity of these data is a challenging task, and advanced complexity-based exploratory tool are required for understanding and visualizing such data. This paper discusses the Fisher-Shannon method, from which one can obtain a complexity measure and detect non-stationarity, as an efficient data exploration tool. The state-of-the-art studies related to the Fisher-Shannon measures are collected, and new analytical formulas for positive unimodal skewed distributions are proposed. Case studies on both synthetic and real data illustrate the usefulness of the Fisher-Shannon method, which can find application in different domains including time series discrimination and generation of times series features for clustering, modeling and forecasting. The paper is accompanied with Python and R libraries for the non-parametric estimation of the proposed measures

    Wavelet Scale Variance Analysis of Wind Extremes in Mountainous Terrains

    Get PDF
    The 10-min average wind speed series recorded at 130 stations distributed rather homogeneously in the territory of Switzerland are investigated. Fixing a percentile-based threshold of the wind speed distribution, a wind extreme is defined as the duration of the sequence of consecutive wind values above the threshold. This definition allows to analyze the sequence of extremes as a temporal point process marked by their duration. Representing the sequence of wind extremes by the inter-extreme interval series, the wavelet variance, a useful tool to investigate the variance of a time series across scales, was applied in order to find a link between the wavelet scales and several topographic parameters. Our findings suggest that the mean duration of wind extremes and mean inter-extreme time are positively correlated and that such relationship depends on the threshold of the wind speed. Furthermore, the threshold of the wind speed distribution correlates best with a terrain parameter related to the Laplacian of terrain elevations; and, in particular, for wavelet scales less than 3, the terrain exposure may explain the formation of extreme wind speeds

    A novel framework for spatio-temporal prediction of environmental data using deep learning

    Get PDF
    As the role played by statistical and computational sciences in climate and environmental modelling and prediction becomes more important, Machine Learning researchers are becoming more aware of the relevance of their work to help tackle the climate crisis. Indeed, being universal nonlinear function approximation tools, Machine Learning algorithms are efficient in analysing and modelling spatially and temporally variable environmental data. While Deep Learning models have proved to be able to capture spatial, temporal, and spatio-temporal dependencies through their automatic feature representation learning, the problem of the interpolation of continuous spatio-temporal fields measured on a set of irregular points in space is still under-investigated. To fill this gap, we introduce here a framework for spatio-temporal prediction of climate and environmental data using deep learning. Specifically, we show how spatio-temporal processes can be decomposed in terms of a sum of products of temporally referenced basis functions, and of stochastic spatial coefficients which can be spatially modelled and mapped on a regular grid, allowing the reconstruction of the complete spatio-temporal signal. Applications on two case studies based on simulated and real-world data will show the effectiveness of the proposed framework in modelling coherent spatio-temporal fields.Comment: 11 pages, 8 figure

    Numerical Experiments with Support Vector Machines

    Get PDF
    The report presents a series of numerical experiments concerning application of Support Vector Machines for the two class spatial data classification. The main attention is paid to the variability of the results by changing hyperparameters: bandwidth of the radial basis function kernel and C parameter. Training error, testing error and number of support vectors are plotted against hyperparameters. Number of support vectors is minimal at the optimal solution. Two real case studies are considered: Cd contamination in the Leman Lake, Briansk region radionuclides soil contamination. Structural analysis (variography) is used for the description of the spatial patterns obtained and to monitor the performance of SVM

    Spatial Data Mapping with Support Vector Regression

    Get PDF
    The paper deals with the novel application of Support Vector Regression (SVR) for the analysis and modelling of spatially distributed environmental data. Mapping of soil pollution is considered as a real case study. Variography is widely used to control the performance of the machines. Geostatistical explanations for the SVR hyperparameters are given. Obtained results demonstrate flexibility and efficiency of SVR application to spatial data
    corecore